PHFR000121 




24.10.2001 



Video coding method and corresponding encoder 



u kp^j ^ The present invention generally relates to video compression, andnjOFe 
particularly, to a video coding method applied to a video sequence anci^rdvided for use in a 
video encoder comprising base layer coding means, receivy^g-Said video sequence and 
generating therefrom base layer signals that corresperfcf to video objects (VOs) contained in 
the video frames of said sequence and coijstifute a first bitstream suitable for transmission at 
a base layer bit rate to a video de^od^r, and enhancement layer coding means, receiving said 
video sequence and a decpdda version of said base layer signals and generating therefrom 
enhancement layer^si^nals associated with corresponding base layer video signals and 
suitable for tarfismission at an enhancement layer bit rate to said video decoder. More 
precisely; it relates to a method allowing to code the VOs of said sequence and comprising 
tjvefsteps of: 

(A) segmenting the video sequence into said VOs ; 

(B) coding the successive video object planes (VOPs) of each of said VOs, 
said coding step itself comprising sub-steps of coding the texture and the shape of said VOPs, 
said texture coding sub-step itself comprising a first coding operation without prediction for 
the VOPs called intracoded or I-VOPs, coded without any temporal reference to another 
VOP, a second coding operation with a unidirectional prediction for the VOPs called 
predictive or P-VOPs, coded using only a past I- or P-VOP as a temporal reference, and a 
third coding operation with a bidirectional prediction for the VOPs called bidirectional 
predictive or B-VOPs, coded using both past and future I- or P-VOPs as temporal references. 

The invention also relates to computer executable process steps stored on a 
computer readable medium and provided for carrying out such a coding method, to a 
corresponding computer program product, and to a video encoder carrying out said method. 



The temporal scalability is a feature now offered by several video coding 
schemes. It is, for example, one of the numerous options of the MPEG-4 video standard. A 
base layer is encoded at a given frame rate, and an additional layer, called enhancement layer, 
is also encoded, in order to provide the missing frames to form a video signal with a higher 
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frame rate and thus to provide a higher temporal resolution at the display side. At the 
decoding side, only the base layer is usually decoded, but the decoder may also, in addition, 
decode the enhancement layer, which allows to output more frames per second. 

Several structures are used in MPEG-4, for example the video objects (VOs), 
which are the entities that a user is allowed to access and manipulate, and the video object 
planes (VOPs), which are instances of a video object at a given time. In an encoded 
bitstream, different types of VOPs can be found : intra coded VOPs, using only spatial 
redundancy, predictive coded VOPs, using motion estimation and compensation from a past 
reference VOP, and bidirectionally predictive coded VOPs, using motion estimation and 
compensation from past and future reference VOPs. As the MPEG-4 video standard is a 
predictive coding scheme, some temporal references have to be defined for each coded non- 
intra VOP. In the single layer case or in the base layer of a scalable stream, temporal 
references are defined by the standard in a unique way, as illustrated in Fig.l where, the base 
and enhancement layers being designated by (BL) and (EL) respectively, the reference for a 
P-VOP and a B-VOP are shown (each arrow corresponds to a possible temporal reference). 
On the contrary, for the temporal enhancement layer of an MPEG-4 stream, three VOPs can 
be taken as a possible temporal reference for the motion prediction : the most recently 
decoded VOP of the enhancement layer, or the previous VOP in display order of the base 
layer, or the next VOP in display order of the base layer, as also illustrated in Fig.l where 
these three possible choices are shown for a P-VOP and a B-VOP of the temporal 
enhancement layer : one reference has to be selected for each P-VOP of the enhancement 
layer and two for each B-VOP of the same layer. 

3 

It is therefore the object of the invention to propose a video coding method 
allowing to obtain an improved prediction accuracy. 

To this end the invention relates to a coding method such as defined in the 
introductory paragraph of the description and in which the temporal reference of the 
enhancement layer P-VOPs is selected only as the temporally closest candidate, and the 
temporal references of the enhancement layer B-VOPs are selected as the two temporally 
closest candidates, in each of these two situations without any consideration of the layer these 
candidates belong to. 
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The present invention will now be described, by way of example, with 
reference to the accompanying drawing in which Fig. 1 illustrates the possible temporal 
references in the case of a scalable MPEG-4 video stream. 

5 

In a predictive coding scheme such as MPEG-4, three types of coding 
operations can be performed : either a first one without prediction for the VOPs called 
intracoded or I-VOPs, coded without any temporal reference to another VOP, or a second one 
with a unidirectional prediction for the VOPs called predictive or P-VOPs, coded using only 
10 a past or a future I- or P-VOP as a temporal reference, or a third one with a bidirectional 
f% prediction for the VOPs called bidirectional predictive or B-VOPs, coded using both past and 

^ future I- or P-VOPs as temporal references. As seen above, during the encoding process, one 

03 reference has to be selected (out of three candidates) for each P-VOP of the enhancement 

jfll layer and two for each B-VOP of said layer. It is then decided, according to the invention, to 

5| | 1 5 select these temporal references of the enhancement VOPs only on the basis of a temporal 
distance criterion, without any consideration of the layer they belong to. Consequently, the 
selection of these temporal references is performed according to the following rules : 
fU (a) for a P-VOP, the reference is the temporally closest candidate ; 

(b) for a B-VOP, the references are the two temporally closest candidates. 



a 



20 The invention also relates to a video encoder allowing to perform the coding 

method including the selection step hereinabove described. Such a video encoder receives the 
original video signal that is transferred to a base layer encoding unit for generation of a base 
layer bitstream and to an enhancement layer encoding unit for generation of an enhancement 
layer bitstream. The base layer encoding unit contains a main processing branch, comprising 

25 a motion prediction circuit (said prediction is based on the selected temporal references), a 
discrete cosine transform (DCT) circuit, a quantization circuit, an entropy coding circuit, and 
a base layer bit buffer that generates the coded base layer bitstream, and a feedback branch 
comprising an inverse quantization circuit, an inverse discrete cosine transform (IDCT) 
circuit, and a frame memory. Similarly, the enhancement layer encoding unit generates the 

3 0 enhancement layer bitstreams . 

It must be understood such a video encoder can be implemented in hardware 
or software, or by means of a combination of hardware and software. It may then be 
implemented by any type of computer system or other apparatus adapted for carrying out the 
method described herein. A typical combination of hardware and software could be a 
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general-purpose computer system with a computer program that, when loaded and executed, 
controls the computer system such that it carries out the method described herein. 
Alternatively, a specific use computer, containing specialized hardware for carrying out one 
or more of the functional tasks of the invention, could be utilized. The present invention can 
also be embedded in a computer program medium or product, which comprises all the 
features enabling the implementation of the method described herein, and which - when 
loaded in a computer system - is able to carry out this method. The invention also relates to 
the computer executable process steps stored on such a computer readable medium or 
product and provided for carrying out the described video coding method. Computer 
program, software program, program, program product, or software, in the present context 
mean any expression, in any language, code or notation, of a set of instructions intended to 
cause a system having an information processing capability to perform a particular function 
either directly or after either or both of the following: (a) conversion to another language, 
code or notation, and/or (b) reproduction in a different material form. 



